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ABSTRACT 

The SPACER server provides an interactive frame- 
work for exploring allosteric communication in 
proteins with different sizes, degrees of oligomer- 
ization and function. SPACER uses recently 
developed theoretical concepts based on the 
thermodynamic view of allostery. It proposes 
easily tractable and meaningful measures that 
allow users to analyze the effect of ligand binding 
on the intrinsic protein dynamics. The server shows 
potential allosteric sites and allows users to explore 
communication between the regulatory and func- 
tional sites. It is possible to explore, for instance, 
potential effector binding sites in a given structure 
as targets for allosteric drugs. As input, the server 
only requires a single structure. The server is freely 
available at http://allostery.bii.a-star.edu.sg/. 

INTRODUCTION 

Protein function depends on the inherent dynamics of the 
protein structure. Not only is the balance between differ- 
ent conformational states of importance in this context, 
but also how easily the transitions between them occur. 
The external factors, such as ligand binding or local 
chemical modifications, can affect the conformational 
ensemble and shift the equilibrium toward (in)active con- 
formations. The regulation is called allosteric when the 
effector site is not directly adjacent to the site of altered 
activity (1). The early phenomenological Monod-Wyman- 
Changeux (MWC) (2) and Koshland-Nemethy-Filmer 



(KNF) (3) models were devised to explain a classic 
example of allosteric regulation (4): the cooperative 
ligand binding of many oligomeric proteins, where 
binding of substrate to one subunit affects the ligand 
affinity in other identical subunits. The MWC model pos- 
tulates that binding stabilizes one of several available con- 
formations with emphasis on symmetry conservation, 
whereas the KNF model assumes an induced-fit 
scenario. Since the MWC and KNF models, numerous 
studies have been performed at different levels of coarse- 
graining (5). The models themselves have been expanded 
as well, and allostery is currently considered in proteins of 
different size, shape and degree of oligomerization, 
spanning from small single-domain structures to the 
large chaperones (6,7). Originally, there was an apparent 
dichotomy between MWC and KNF models and their 
counterparts in the energy landscape-based 'new view' of 
allostery (8-11) — conformational selection and induced 
fit. The main difference between the two models is 
whether binding precedes conformational change (11). 
Transition pathway analysis is primarily a matter of 
kinetics, whereas the shift in conformational equilibrium 
is one of thermodynamics: the conformational states 
involved determine which binding sites are allosterically 
connected, and their relative stability before and after 
binding determines the effect of regulation (12). Overall, 
the two models do, however, not describe mutually exclu- 
sive scenarios (6,11): in both cases, there is a shift in the 
population of different functional states on effector 
binding. The issue was resolved with the introduction of 
a more general physical framework (13). 

Despite the progress achieved in the understanding of 
allostery, most of studies have been performed on 
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individual proteins or small collections of them (6,7,14). 
The previously developed approaches to the analysis of 
protein dynamics are mostly focused around the analysis 
of the energetics of the protein's structural ensemble, 
mobility of individual residues and conformational 
changes. For example, the COREX/BEST algorithm (15) 
enumerates the protein ensemble, defines the relative free 
energies of each state and characterizes the energetics of 
the ensemble. The AD-ENM server performs an analysis 
of macromolecular dynamics based on the calculation of 
the spectrum of normal modes for the elastic network 
model (16). The ProDy project allows to analyze dynam- 
ical properties of individual residues and to visualize 
protein dynamics (17). However, a general molecular de- 
scription of allosteric regulation that allows prediction of 
allosteric sites based on protein dynamics, and that 
explains molecular mechanisms of communication 
between sites was still lacking (5). Resorting to the 
thermodynamic view of allostery (5-7), we developed the 
concepts of binding leverage and leverage coupling that 
allow quantifying (i) the coupling between ligand binding 
and the intrinsic dynamics of the protein and (ii) the com- 
munication between different binding sites. These 
concepts also allow finding latent effector binding sites, 
which along with known ones can be considered as poten- 
tial targets for allosteric drugs (5). 

In the era of structural proteomics, with an exploding 
number of protein structures, it is of crucial importance to 
have instruments that allow massive and efficient analysis 
of multiple protein targets. For studying allostery, there 
are several important requirements for such an instru- 
ment. It should be based on a generic molecular model 
of allostery, which works regardless of the size, degree of 
oligomerization or function of the protein. It should work 
with a single structure, regardless of it corresponds to the 
active/activated or inactive/inactivated state of the 
protein. It should be able to explore communication 
between natural allosteric and catalytic sites, to detect 
latent sites in the structure, as well as to analyze sites 
chosen by the user. The SPACER server satisfies the 
aforementioned requirements, providing reasonably fast 
interactive tools for exploratory analysis of allosteric com- 
munication. Later in the text, we provide a brief descrip- 
tion of the theoretical background for SPACER'S 
methods followed by a practical guide to exploratory 
analysis of allosteric communication with SPACER. An 
online tutorial (http://allostery.bii.a-star.edu.sg/tutorial/) 
exemplifies the server workflow for the case of the 
Phosphofructokinase (PFK) homotetramer, showing the 
major options in the SPACER and explaining the most 
important features and results. 



THEORETICAL BACKGROUND 

The balance between different conformations of a protein 
and the role of ligand binding in switching between its 
functional states are the major determinants of allosteric 
regulation and communication. The steps in the analysis 
of allostery in a given protein structure should include, 
therefore: (i) prediction and characterization of the 



substrate- and effector-binding sites; (ii) characterization 
of coupling between the ligand binding and functional 
dynamics to determine regulatory sites; and (iii) analysis 
of the communication between the allosteric and catalytic 
sites, both known and latent ones, as well as sites of 
interest designated by the user. Later in the text is a 
brief description of the major methods used in 
SPACER. It includes two instruments for the search of 
allosteric sites: 'local closeness' (14) based on static geo- 
metric features of the structure and 'binding leverage' (6) 
based on protein dynamics measures. It also includes 
'leverage coupling' (7) that quantifies communication 
between the allosteric and catalytic sites in the protein. 

Local closeness 

Local closeness (14) is a geometry-based predictor of 
ligand-binding sites involved in protein function and regu- 
lation. It detects potential allosteric sites that are not ne- 
cessarily characterized by specific chemical groups (as in 
the catalytic residues) selected in evolution and manifested 
in sequence conservation. Local closeness is a local cen- 
trality measure. It quantifies residue connectivity to neigh- 
bors within a finite distance m in the residue interaction 
graph (RIG), where each residue in the protein is a node. 
The RIG is built on the van der Waals contacts between 
atoms (including hydrogen) of the amino acid residues. 
The local closeness of degree m for a node is defined as 

c -T^ 

k=\ 

where n k is the number of nodes whose shortest distance 
from a given node is exactly k. The local closeness of 
degree four (m = 4) gives the best performance of predic- 
tions (14). This value of m effectively means that only the 
residues closer than 30-40 A are included in the calcula- 
tion, which roughly corresponds to the size scale of single 
domains. It is recommended, therefore, to use m = 4 in 
most of the calculations. Smaller values of m are recom- 
mended when only small cavities on the surface are to be 
investigated. 

Binding leverage 

Binding leverage (6) measures the ability of a binding site 
to couple to the intrinsic motions of a protein by quan- 
tifying the cost of the binding site deformation when a 
ligand is present and is resisting the motion. Potential 
binding sites are found by a coarse-grained docking pro- 
cedure [see 'Implementation' section and reference (6)]. 
Conformational changes are approximated here by low 
frequency C a normal modes. The binding leverage L A 
for a set of normal modes A is calculated as 

L A = J2 AU » 

A U[+ represents the total change in potential energy of a 
set of springs owing to the motion of a normal mode [x. 
Springs of length dy are placed between all pairs of Cp, 
atoms i and /, whose connecting line passes within 3.5 A 
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of any ligand atom (k is an arbitrary spring constant), 
giving the following expression for AU: 

ij 

The binding leverage of a site both depends on the range 
of the motion at the site and how many pairs of residues 
interact with the ligand. A ligand that binds to a site with 
high binding leverage has a potential to lock one or more 
collective degrees of freedom (represented here by normal 
modes). 

Leverage coupling 

We introduced the concept of leverage coupling to provide 
a quantitative characteristic of allosteric communication 
(7). It is based on the assumption that sites that have high 
binding leverage for the same motion are more likely to be 
allosterically coupled than sites that only have high 
binding leverage for motion along independent degrees 
of freedom. 

The strength of communication between two sites P and 
Q is defined as a dot product of binding leverages, 1 P and 
Xq, of these sites: 

D P q = Xp ■ Xq. 

The vector of the binding leverage of site P is defined as 
X P = I , ... ,X* P ), where Xp is a binding leverage of the 
site P caused by the normal mode \x: 

p "~ i mi 

where the norm of P is the number of elements in the set. 
The normalized leverage coupling Cpq 

Cpq — D P q/Dp P Dqq 

has the range 0<C P q<\. The normalized leverage 
coupling C P q is necessary for the analysis of big molecular 
machines like chaperones (7), where the conformational 
change at the binding sites is small compared with the 
large-scale functional motions. In this case, the task is to 
compare the values between different sites and to find the 
most correlated pairs of the sites for a given protein. The 
measure C P q is thus used (instead of D P q) to analyze how 
binding sites are correlated with different modes of func- 
tional motion. As a result, leverage coupling allows one to 
investigate allosteric communication in enzymes regulated 
by ligand binding and phosphorylation in proteins with 
different sizes and degree of oligomerization (7). 

IMPLEMENTATION 

The SPACER server is written in Python using the 
Pyramid framework (http://www.pylonsproject.org/). 
The server has a modular distributed architecture, 
where the modules communicating via the Advanced 
Message Queuing Protocol are asynchronously con- 
nected by a Celery distributed task queue (http://www. 
celeryproject.org/). Docking and binding leverage 



modules are implemented in C, local closeness in C++. 
The interactive web interface is powered by the java- 
script libraries jQuery (http://www.jquery.com/) and 
d3js (http://d3js.org/), and by the Jmol molecular 
viewer (http://www.jmol.org/) for visualizing the protein 
structures. The normal mode analysis is done using C a 
elastic networks using the Molecular Modeling Toolkit 
(18). The SPACER server interacts with the Protein 
Databank (19) and PDBePISA (20) on the fly. 



USING THE SPACER SERVER 

The server is designed for interactive exploratory analysis 
of protein structures. Therefore, there is no special pro- 
grammatic interface for automatic execution of tasks. 
However, we provide machine-readable output files for 
every step of the analysis workflow. 

The binding leverage calculations, including normal 
mode analysis and docking, may take a significant 
amount of time depending on the size of the protein 
complex (about an hour on average, up to 40 h for chap- 
erones). To minimize the waiting time, we have pre- 
calculated many proteins with the default parameters. 
The session menu in SPACER interface indicates the 
status of jobs, whether they are ready or still running. 
There is also a link for restoring the session should the 
user decide to return to the results later. In general, writing 
down the session ID (five or six characters) should be 
sufficient for switching back to the analysis at any time 
or sharing the results with colleagues. It is only possible to 
work with one structure at a time in a session. The session 
is stored for at least 6 months from the moment of last 
access. 

To exemplify the work and major options provided by 
the server, we use the tetrameric enzyme PFK, which 
displays a classic example of allostery. The enzyme 
is allosterically inhibited by phosphoenolpyruvate 
and activated by ADP binding to the same site. It is 
cooperative with respect to binding of the two substrates, 
fructose-6-phosphate (F6P) in the presence of 
phosphoenolpyruvate. We use the crystal structure supple- 
mented by the allosteric activator ADP (PDB ID 4pfk). 
What biological conclusion can the user obtain with the 
help of SPACER? First, the sites with highest local close- 
ness and binding leverage correspond to known ligand- 
binding sites in PFK (Figure 1). Second, Figure 2a 
shows that there is weak communication between the 
ADP-binding site and F6P-binding active sites. 
However, the communication between different ADP- 
binding sites is stronger. Third, Figure 2b shows strong 
communication between the F6P-binding (functional) 
sites and weak communication of the functional sites 
with the ADP-binding (regulatory) sites. Finally, the 
color-coded matrix D P q shows an overview of the levels 
of pairwise communication between all defined sites 
(Figure 2c). The values are normalized to the interval 
from zero (no communication — blue) to one (strongest 
measured communication — red); white color corresponds 
to weak communication. The matrix is interactive; the 
selected pair of sites is highlighted by color in the 
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SPACER Assembly Sites -fg Potentiat sites - Allosterlc communication - Help * 



Session sWWje4 (4pfk) - 



Functional and Allosteric Sites 



# Type Description 



1 


struct F6P-binding site 
(RF6P 323) 


M:Gly170, M:Glu222, M:Met169, A:Arg243, M:Arg252, A:Argt62, 
M:His249, M:Asp127, M:Arg171 


• 


2 


struct F6P-binding site 
(L:F6P 323) 


S:Arg252, S:Gly170, S:Asp127, S:Arg171, G:Arg162, G:Arg243, 
S:Met169, S:His249, S:Glu222 


• 


3 


struct F6P-binding site 
(R:F6P 323) 


A:Glu222, M:Arg1 62, A:Gly1 70, M:Arg243, A:Asp1 27, A:Met1 69, 
A:Arg252, A:Arg171 , A:His249 




4 


struct F6P-binding site 
(X:F6P 323) 


G:His249, S:Arg162, G:Glu222, G:Arg252, S:Arg243, G:Met169, 
G:Asp127, G:Gly170, G:Arg171 




5 


struct ADP-binding site 
(D:ADP 324) 


A:Tyr41 , A:Lys1 11 , A:Cys73, A:Ser9, A:Lys77, A:Arg72, A:Phe76, 
A:Gly104 


• 


6 


struct ADP-binding site 
(J:ADP 324) 


G:Lys77, G:Ser9, G:Cys73, G:Tyr41, G:Gly104, G:Phe76, G:Arg72, 
G:Lys111 


• 


7 


struct ADP-binding site 
(P:AOP 324) 


M:Tyr41 , M:Cys73, M:Ser9, M:Gly1 04, M:Lys1 1 1 , M:Phe76, 
M:Lys77, M:Arg72 


• 


8 


struct ADP-binding site 
(V:ADP 324) 


S:Cys73, S:Lys77, S:Ser9, S:Gly104, S:Lys111, S:Phe76, S:Tyr41, 
S:Arg72 




B 










Click to select a paten within 5 A between Cq : 
Site e.g. known allosteric site O Add 




Figure 1. Exploring the effector and catalytic sites. SPACER screenshots: (a) showing the list of sites, one selected site, and the tool to add user- 
selected sites interactively; (b) Local closeness tool showing the results in Phosphofruktokinase (PFK, PDB id 4pfk) in two projections; (c) Binding 
leverage tool showing the results in PFK in two projections. 
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Figure 2. Exploring the allosteric communication between the sites. SPACER screenshots: (a) showing the leverage coupling between an ADP- 
binding (activator) site in PFK and the rest of the structure; (b) leverage coupling between F6P binding site and the rest of the structure, (c) The 
DPQ matrix shows communication between eight sites in PFK (four sites in each subunit): F6P-binding sites (1-4) and ADP-binding sites (5-8), the 
last row and column in the matrix designate communication with the rest of the structure (background, BG). The values are color-coded from blue 
(0) to red (1) via white (0.5). A pair of allosteric sites (3 and 7) is selected and highlighted with a green border in the matrix, (d) Selected pair of sites 
(orange and green) shown directly on the protein structure. 
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molecular viewer (Figure 2d). Importantly, the matrix also 
shows the level of communication between the sites and 
parts of the structure not belonging to any site (back- 
ground, abbreviated as BG). The background communi- 
cation is used as a control and can be subtracted from the 
values by switching the matrix display mode. The analysis 
of large protein complexes such as chaperones requires a 
special normalization (C P q), as the values may not be 
directly comparable between the subunits and within the 
subunits. The C P q matrix can be shown for any structure 
by activating the respective display mode. Below is a step 
by step description of the server's options and the outputs 
of calculations with explanations of the illustrations and 
downloadable data. 

Input 

The analysis of allosteric communication requires a 
protein structure representing the biologically relevant as- 
sembly — a protein complex in its natural oligomeric con- 
figuration. The easiest way to start is to provide a PDB 
ID. SPACER will then try to find the most probable bio- 
logical assembly in the PISA and PDB databases. In case 
of PFK enzyme (PDB ID 4pfk), the best assembly is 
retrieved from PISA and displayed in the embedded 
Jmol viewer on the Biological Assembly Page (see 
tutorial for illustrations). If no assembly is found, the 
structure will be fetched from the Protein Databank as 
is. Alternatively, the user can provide the atomic coordin- 
ates in PDB format. The only requirement is that the file 
should have consistent residue/atom naming and number- 
ing, according to the standard PDB format. SPACER will 
show the assembly structure for visual control with the 
subunits in different colors. At this stage, it is possible 
to remove some of the protein chains, in case it is only 
required to analyze a part of the protein complex (for 
instance, a subunit of a chaperone). Removing the 
chains might affect or even significantly change the allo- 
steric communication compared with the picture obtained 
for the native protein structure. 

The sites 

The SPACER server analyzes allosteric communication 
between sites in protein structures. Therefore, it is essen- 
tial to identify the sites. If the ligands are present in the 
PDB file, the corresponding catalytic and effector-binding 
sites are added automatically. Otherwise, the sites can be 
added manually. The local closeness and binding leverage 
tools can help in identifying potential ligand binding and 
functional sites. We provide an interactive way of 
analyzing the sites of interest manually by clicking on 
any atom on the protein surface (Figure la). The site 
will include the residues closely located to the selected 
one (the radius can be adjusted). It is also possible to 
edit the list of residues by specifying the chain name and 
index or each residue. In case of PFK described in the 
tutorial, the user is provided with the list of four ADP- 
and four F6P-binding sites extracted from a given PDB 
structure (Figure la). Once the sites are defined, the allo- 
steric communication between them can be explored using 
leverage coupling. 



The 'local closeness' tool requires only a single param- 
eter — the degree of the RIG, set by default to four. The 
results are shown as colored surface, where the highest 
values (red) correspond to the potential binding sites 
(Figure lb). The user can add the sites to the list by 
clicking on the red patches with high values. The local 
closeness results can be downloaded as a PDB file with 
the values stored as B-factors or as a table in plain text 
(tab-separated csv format). Two orientations of the PFK 
structure are shown in Figure lb with surfaces gradually 
colored from blue to red depending on the value of the 
local closeness. As a control of recall of known binding 
sites, we show the averaged values of local closeness along- 
side the defined sites. Using the values for the known sites 
as a ground level makes it easier to identify and add the 
potential binding sites of interest manually. 

The 'Binding leverage' tool uses Monte Carlo docking 
simulations to probe the surface of the protein (6). The 
main parameter is the probe size — the probe is modeled as 
a peptide with 2-6 C a atoms. The default probe size is four 
C a atoms. The probe should be small enough to fit the 
cavities, but large enough to not get buried in sites not 
accessible to the real ligands. Binding leverage effectively 
measures the coupling between probe binding and the de- 
formations described by the lowest frequency normal 
modes. Therefore, the effect of probe size on the calcula- 
tion results can be significant. After changing the param- 
eter, a recalculation of the probe docking is needed, which 
might take some time, depending on protein size. 
Importantly, in the search for ligand-binding sites, the 
ligands already existing in the structure are excluded. 
Binding leverage results for PFK are shown as colored 
surface (Figure lc) and can be downloaded as plain-text 
table or a PDB file with B-factors. It is possible to add 
additional sites of interest while exploring the binding 
leverage results. As in the case of local closeness, the 
averaged values of binding leverage are also shown along- 
side the already listed sites, making it easier to add new 
sites based on the leverage values. 

Allosteric communication 

Exploring the allosteric communication is the final step of 
SPACER achieved by calculating the leverage coupling. 
There are two ways of showing leverage coupling: (i) com- 
munication between a given site and the rest of the protein 
structure, (ii) communication between pairs of annotated 
sites. In the former case (shown for PFK, Figure 2a and 
b), the results are shown as a colored surface and can be 
downloaded as a PDB file with B-factors or as a plain-text 
table. In the latter case (Figure 2c), the results are shown 
in the form of colored symmetric matrix (D P q with 
options for the background analysis, and C P q (optional) 
for the analysis of big molecular machines, see also theor- 
etical background). Each cell in the matrix corresponds to 
a pair of sites P and Q, and the color show the strength of 
allosteric communication between the sites. The inter- 
active tool will show a pair of communicating sites on 
the structure once the user clicks on the corresponding 
cell in the matrix (shown for PFK, Figure 2c and d). 
The last row and column in the matrices shows the 
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leverage coupling with the rest of the structure (back- 
ground). The background is defined as the residues not 
included in any of the described sites. If the background 
value is high, it may indicate that some potential effector- 
binding/catalytic sites are not listed and should be added. 
The resulting matrices can be downloaded in machine- 
readable JSON format. 



CONCLUSIONS AND OUTLOOK 

The SPACER server establishes an interactive exploratory 
framework for finding allosteric and catalytic ligand- 
binding sites and for analyzing the communication 
between them. SPACER implements a unique approach 
that provides simple and meaningful physics-based 
quantities for characterizing the link between structural 
dynamics and binding of effector molecules. It is applic- 
able to a wide range of proteins — from small monomeric 
structures to large protein complexes (6,7). Importantly, 
it works with a single crystal structure, which is sufficient 
for representing the conformational ensemble of the 
protein (6). 

The results of the analysis provided by SPACER can be 
used to study many different areas, such as protein 
function, regulation and evolution of protein function, 
protein dynamics, X-ray and nuclear magnetic resonance 
analysis, drug design and so forth. In particular, the pre- 
sented approach allows one to detect latent allosteric sites 
or analyze user-selected sites of interest, which can be 
further explored as potential targets for allosteric drugs 
(5,21). 
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